## [1] "The header of lung cancers: "
## x y lon lat
## 1 35900 41730 -2.621759 53.65058
## 2 35310 42690 -2.712468 53.73637
## 3 35340 42250 -2.707256 53.69685
## 4 35900 41760 -2.621798 53.65328
## 5 35780 41620 -2.639762 53.64060
## 6 35320 42700 -2.710967 53.73727
## [1] 978
## [1] "The number of lung cancers is 978"
## [1] "The header of larynx cancers: "
## x y lon lat
## 1 35320 42800 -2.711119 53.74626
## 2 35310 42230 -2.711769 53.69502
## 3 34910 41850 -2.771718 53.66050
## 4 35260 42080 -2.719111 53.68150
## 5 35300 42150 -2.713162 53.68782
## 6 35230 42660 -2.724548 53.73360
## [1] 58
## [1] "The number of larynx cancers is 58"
## [1] "The header of incinerator: "
## x y lon lat
## 1 35450 41360 -2.689291 53.61696
## [1] 1
## [1] "The number of incinerator is 1"
The general geographic location of data is at Leyland, Chorley and Penwortham. From the bounding box, the polygon “window” (the green box) encloses all observed points, spanning roughly 15–20 km in both the north–south and east–west directions
Lung Cancer (red points): Their count is the largest and scattered throughout the whole domain. They are clustered around the centres of Leyland, Chorley and Penwortham
Larynx Cancer (blue points): These points are fewer. There are less obvious clusters around the centres of Leyland, Chorley and Penwortham
Incinerator (black point): There is only one incinerator, and it is situated near the southwestern boundary of the bounding window. At first glance, there is not any obvious pattern of cancer cases around the incinerator.
Assumption: Points occur randomly and independently with a constant rate λ
Intensity: λ(x,y) = constant.
Assumption: Points still occur independently. The number of events in two non-overlapping regions are independent, regardless of differences in intensity.
Intensity: λ(s) = \(\hat{\lambda}(s) = \frac{1}{h^2} \sum_i \kappa\left(\frac{\|s - s_i\|}{h}\right) / q(\|s\|)\), a known function of location
Assumptions: Points are Poisson conditional on a latent Gaussian field Z(x), so the log-intensity is random and spatially correlated. Cox processes are considered doubly stochastic, intensity is heterogeneous, but also may be a random quantity
Intensity: λ(x)=exp(Z(x)). Often leads to clustering.
Assumption: Points are placed sequentially; each new point must be at least ‘r’ away from existing points.
Intensity: No simple closed form
What is CRS:
CRS means uniform distribution. Any location has equal probability of containing a point.
CRS is modeled by a homogeneous Poisson process, with constant rate parameter.
As for independence, the occurrence of one point does not have any influence on the likelihood of a point in another locations.
Given that there are N points of the Poisson process in area A, these N points are conditionally independent and uniformly distributed in A
As we can see, empirical values is a lot larger than our theoretical ones, and lie above and outside the envelope. This suggests a clustering pattern.
As we can see, empirical values is larger than our theoretical ones, and lie above and outside the envelope . This means there is clustering.
As we can see, empirical values is above our theoretical ones, and almost within the envelope. This suggest a relatively clustering pattern.
As we can see, empirical values is above our theoretical ones, and almost within the envelope. This suggest a relatively random pattern.
Ripley’s K-function:
K(r) tests the expected number of events within distance h from an arbitrary event (excluding the chosen event itself) divided by the average number of events per unit area. K(r) = E(N0(r))/λ
For a process that is more regular than CSR we expect fewer events within distance r of a randomly chosen event. For a process that is more clustered than CSR we expect more events within distance r of a randomly chosen event
K(r) is equivalent to showing the variance of the number of events occurring in subregion A
Under CSR K(r) = πr2, the area of a circle of radius r
Deviations above the horizontal line indicate clustering because there are more events within distance h than expected. Deviations below the horizontal line indicate regularity because there are fewer events within distance h than expected.
L version: plot \((h) \text{ vs } \hat{L}(h) - h \text{ where } \hat{L}(h) = \left(\frac{\hat{K}(h)_{ec}}{\pi}\right)^{1/2}\)
G function:
##
## Conditional Monte Carlo test of CSR using quadrat counts
## Test statistic: Pearson X2 statistic
##
## data:
## X2 = 1854, p-value = 0.001
## alternative hypothesis: two.sided
##
## Quadrats: 59 tiles (irregular windows)
##
## Conditional Monte Carlo test of CSR using quadrat counts
## Test statistic: Pearson X2 statistic
##
## data:
## X2 = 1854, p-value = 5e-04
## alternative hypothesis: clustered
##
## Quadrats: 59 tiles (irregular windows)
##
## Conditional Monte Carlo test of CSR using quadrat counts
## Test statistic: Pearson X2 statistic
##
## data:
## X2 = 1854, p-value = 1
## alternative hypothesis: regular
##
## Quadrats: 59 tiles (irregular windows)
| Uniformly Distributed | Clustered | Regular Psrocesses | |
|---|---|---|---|
| Result | X2 = 1854 p-value = 0.001 |
X2 = 1854 p-value = 5e-04 |
X2 = 1854 p-value = 1 |
| Interpretation | The null hypothesis, that points follow a uniform Poisson process is rejected. | The alternative hypothesis, that points follow a clustering pattern is proved. So there is significant clustered pattern | The alternative hypothesis, that points follow a regular pattern is not proved. So there is NO significant regular pattern |
##
## Conditional Monte Carlo test of CSR using quadrat counts
## Test statistic: Pearson X2 statistic
##
## data:
## X2 = 100.64, p-value = 0.058
## alternative hypothesis: two.sided
##
## Quadrats: 59 tiles (irregular windows)
##
## Conditional Monte Carlo test of CSR using quadrat counts
## Test statistic: Pearson X2 statistic
##
## data:
## X2 = 100.64, p-value = 0.0315
## alternative hypothesis: clustered
##
## Quadrats: 59 tiles (irregular windows)
##
## Conditional Monte Carlo test of CSR using quadrat counts
## Test statistic: Pearson X2 statistic
##
## data:
## X2 = 100.64, p-value = 0.975
## alternative hypothesis: regular
##
## Quadrats: 59 tiles (irregular windows)
| Uniformly Distributed | Clustered | Regular Psrocesses | |
|---|---|---|---|
| Result | X2 = 100.64 p-value = 0.066 |
X2 = 100.64 p-value = 0.0255 |
X2 = 100.64 p-value = 0.975 |
| Interpretation | The null hypothesis, that points follow a uniform Poisson process is rejected at significant level of 7.5% | The alternative hypothesis, that points follow a clustering pattern is proved. So there is significant clustered pattern | The alternative hypothesis, that points follow a regular pattern is not proved. So there is no significant regular pattern |
At first, we need to know a Dirichlet tessellation is able to partition the plane so that each point’s “cell” includes all locations closer to it than to any other point. This can reveal how points partition their surrounding space.
Large cells imply that point is relatively isolated; small cells imply a local cluster. there are several areas of many small cells, meaning clusters. These clusters of small cells match the statistical evidence of Quadrat.test, K-function, and G-function
There are no strong evidence of concentration areas of many small cells. This result match with result of K-function and G-function, which indicate weaker or minimal clustering relative to lung cancer. However, this result does NOT align with the Quadrat.test result. One potentail reason is that Quadrat.test is more statistically rigorous, while Dirichlet Tesselation relies on eyeballing.
There are a few hot spots, indicating the local maxima of the kernel density. These hot spots represents the region with a relatively higher concentration of lung cancer cases, in other words, clustering. This results align with the results of K-function, G-function, and Quadrat.test.
## [1] -3.015641e-20 3.085959e-04
The maximum and minimum values of the intensity estimate over the study region is -3.0156e-20 to 3.0859e-04. The large difference between maximum and minimum indicates the existence of clustering
## sigma
## 72.77397
## real-valued pixel image
## 128 x 128 pixel array (ny, nx)
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
## dimensions of each pixel: 137 x 138.2812 units
## Image is defined on a subset of the rectangular grid
## Subset area = 227037826.538086 square units
## Subset area fraction = 0.733
## Pixel values (inside window):
## range = [-3.015641e-20, 0.0003085959]
## integral = 973.8974
## mean = 4.289582e-06
The optimal bandwidth is 72.77397, and 4.2895e-06 is the average intensity (points per square unit) inside the window. The value integral(973.8974) is approximately equal to the number of lung cancers, which means correct implementation.
There are a few hot spots, indicating the local maxima of the kernel density. These hot spots represents the region with a relatively higher concentration of larynx cancer cases, in other words, clustering. This results align with the results of K-function, G-function, and Quadrat.test.
## [1] 0.000000e+00 1.810377e-06
The maximum and minimum values of the intensity estimate over the study region is 0.00 to 1.8103e-06. The large difference between maximum and minimum indicates the existence of clustering
## sigma
## 642.1233
## real-valued pixel image
## 128 x 128 pixel array (ny, nx)
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
## dimensions of each pixel: 137 x 138.2812 units
## Image is defined on a subset of the rectangular grid
## Subset area = 227037826.538086 square units
## Subset area fraction = 0.733
## Pixel values (inside window):
## range = [0, 1.810377e-06]
## integral = 58.59965
## mean = 2.581052e-07
The optimal bandwidth is 642.1233, and 2.5810e-07 is the average intensity (points per square unit) inside the window. Also, the value integral(58.59965) is approximately equal to the number of lung cancers, which means correct implementation.
## Point process model
## Fitted to data: lung_ppp
## Fitting method: maximum likelihood
## Model was fitted analytically
## Call:
## ppm.formula(Q = lung_ppp ~ 1, interaction = Poisson())
## Edge correction: "border"
## [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
##
## Data pattern:
## Planar point pattern: 978 points
## Average intensity 4.31e-06 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
## (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
##
## Dummy quadrature points:
## 70 x 70 grid of dummy points, plus 4 corner points
## dummy spacing: 250.0000 x 252.8571 units
##
## Original dummy parameters: =
## Planar point pattern: 3659 points
## Average intensity 1.61e-05 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
## (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
## Quadrature weights:
## (counting weights based on 70 x 70 array of rectangular tiles)
## All weights:
## range: [4270, 63200] total: 2.27e+08
## Weights on data points:
## range: [4270, 31600] total: 17800000
## Weights on dummy points:
## range: [4270, 63200] total: 2.09e+08
## --------------------------------------------------------------------------------
## FITTED :
##
## Stationary Poisson process
##
## ---- Intensity: ----
##
##
## Uniform intensity:
## [1] 4.308655e-06
##
## Estimate S.E. CI95.lo CI95.hi Ztest Zval
## log(lambda) -12.35488 0.03197647 -12.41756 -12.29221 *** -386.3742
##
## ----------- gory details -----
##
## Fitted regular parameters (theta):
## log(lambda)
## -12.35488
##
## Fitted exp(theta):
## log(lambda)
## 4.308655e-06
## log(lambda)
## -12.35488
## Nothing plotted -- all plots selected are flat surfaces.
## [1] 26124.15
Since the intensity is constant, the homogeneous Poisson point possess is being fit.
The uniform intensity is 4.30e-06, meaning you expect 4.30e-06 lung cancer cases per unit area
AIC is 26124.15
## Point process model
## Fitted to data: lung_ppp
## Fitting method: maximum likelihood (Berman-Turner approximation)
## Model was fitted using glm()
## Algorithm converged
## Call:
## ppm.ppp(Q = lung_ppp, trend = ~log(x) + log(y), interaction = Poisson())
## Edge correction: "border"
## [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
##
## Data pattern:
## Planar point pattern: 978 points
## Average intensity 4.31e-06 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
## (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
##
## Dummy quadrature points:
## 70 x 70 grid of dummy points, plus 4 corner points
## dummy spacing: 250.0000 x 252.8571 units
##
## Original dummy parameters: =
## Planar point pattern: 3659 points
## Average intensity 1.61e-05 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
## (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
## Quadrature weights:
## (counting weights based on 70 x 70 array of rectangular tiles)
## All weights:
## range: [4270, 63200] total: 2.27e+08
## Weights on data points:
## range: [4270, 31600] total: 17800000
## Weights on dummy points:
## range: [4270, 63200] total: 2.09e+08
## --------------------------------------------------------------------------------
## FITTED :
##
## Nonstationary Poisson process
##
## ---- Intensity: ----
##
## Log intensity: ~log(x) + log(y)
##
## Fitted trend coefficients:
## (Intercept) log(x) log(y)
## 92.598926 -7.148775 -1.048353
##
## Estimate S.E. CI95.lo CI95.hi Ztest Zval
## (Intercept) 92.598926 54.031934 -13.301718 198.499571 1.7137815
## log(x) -7.148775 2.653131 -12.348817 -1.948733 ** -2.6944670
## log(y) -1.048353 3.114080 -7.151839 5.055132 -0.3366494
##
## ----------- gory details -----
##
## Fitted regular parameters (theta):
## (Intercept) log(x) log(y)
## 92.598926 -7.148775 -1.048353
##
## Fitted exp(theta):
## (Intercept) log(x) log(y)
## 1.641356e+40 7.858262e-04 3.505145e-01
## (Intercept) log(x) log(y)
## 92.598926 -7.148775 -1.048353
## [1] 26118.86
\(\lambda(x, y)\) is modeled as \(\beta_0 + \beta_1 \cdot x + \beta_2 \cdot y\). This is an inhomogeneous Poisson process with a linear intensity in terms of \(x\) and \(y\).
AIC is 26118.86
Only ‘X’ shows a significant Ztest, meaning spatial intensity is significantly related to the X-direction.
## Point process model
## Fitted to data: lung_ppp
## Fitting method: maximum likelihood (Berman-Turner approximation)
## Model was fitted using glm()
## Algorithm converged
## Call:
## ppm.ppp(Q = lung_ppp, trend = ~incin_dist, interaction = Poisson())
## Edge correction: "border"
## [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
##
## Data pattern:
## Planar point pattern: 978 points
## Average intensity 4.31e-06 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
## (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
##
## Dummy quadrature points:
## 70 x 70 grid of dummy points, plus 4 corner points
## dummy spacing: 250.0000 x 252.8571 units
##
## Original dummy parameters: =
## Planar point pattern: 3659 points
## Average intensity 1.61e-05 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
## (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
## Quadrature weights:
## (counting weights based on 70 x 70 array of rectangular tiles)
## All weights:
## range: [4270, 63200] total: 2.27e+08
## Weights on data points:
## range: [4270, 31600] total: 17800000
## Weights on dummy points:
## range: [4270, 63200] total: 2.09e+08
## --------------------------------------------------------------------------------
## FITTED :
##
## Nonstationary Poisson process
##
## ---- Intensity: ----
##
## Log intensity: ~incin_dist
## Model depends on external covariate 'incin_dist'
## Covariates provided:
## incin_dist: distfun
##
## Fitted trend coefficients:
## (Intercept) incin_dist
## -5.126593e+00 -1.453328e-05
##
## Estimate S.E. CI95.lo CI95.hi Ztest
## (Intercept) -5.126593e+00 3.778934e+00 -1.253317e+01 2.279981e+00
## incin_dist -1.453328e-05 7.600753e-06 -2.943049e-05 3.639179e-07
## Zval
## (Intercept) -1.356624
## incin_dist -1.912085
##
## ----------- gory details -----
##
## Fitted regular parameters (theta):
## (Intercept) incin_dist
## -5.126593e+00 -1.453328e-05
##
## Fitted exp(theta):
## (Intercept) incin_dist
## 0.005936754 0.999985467
## (Intercept) incin_dist
## -5.126593e+00 -1.453328e-05
## [1] 26120.46
Since incinerator is a single point, we need to create a square window from (xi - r, yi - r) to (xi + r, yi + r), where r is 20 meters. In this way, we can approximate a polygon.
Here, \(\lambda(x, y)\) is modeled as \(\exp(\beta_0 + \beta_1 \cdot \text{dist_to_incin}(x, y))\).
This is an inhomogeneous Poisson model.
The parameter estimate is -1.453328e-05, but is NOT statistically significant. There is no strong evidence that intensity will increase or decrease with distance. In other words, it suggests the distance from the incinerator does not affect the intensity of lung cancers.
## Point process model
## Fitted to data: Larynx_ppp
## Fitting method: maximum likelihood
## Model was fitted analytically
## Call:
## ppm.formula(Q = Larynx_ppp ~ 1, interaction = Poisson())
## Edge correction: "border"
## [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
##
## Data pattern:
## Planar point pattern: 58 points
## Average intensity 2.56e-07 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
## (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
##
## Dummy quadrature points:
## 32 x 32 grid of dummy points, plus 4 corner points
## dummy spacing: 546.875 x 553.125 units
##
## Original dummy parameters: =
## Planar point pattern: 794 points
## Average intensity 3.5e-06 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
## (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
## Quadrature weights:
## (counting weights based on 32 x 32 array of rectangular tiles)
## All weights:
## range: [9700, 302000] total: 2.27e+08
## Weights on data points:
## range: [60500, 151000] total: 7960000
## Weights on dummy points:
## range: [9700, 302000] total: 2.19e+08
## --------------------------------------------------------------------------------
## FITTED :
##
## Stationary Poisson process
##
## ---- Intensity: ----
##
##
## Uniform intensity:
## [1] 2.555235e-07
##
## Estimate S.E. CI95.lo CI95.hi Ztest Zval
## log(lambda) -15.17995 0.1313064 -15.43731 -14.9226 *** -115.6071
##
## ----------- gory details -----
##
## Fitted regular parameters (theta):
## log(lambda)
## -15.17995
##
## Fitted exp(theta):
## log(lambda)
## 2.555235e-07
## log(lambda)
## -15.17995
## Nothing plotted -- all plots selected are flat surfaces.
## [1] 1878.874
Since the intensity is constant, the homogeneous Poisson point possess is being fit.
The uniform intensity is 2.55e-07, meaning you expect 2.55e-07 lung cancer cases per unit area
AIC is 1878.874
## Point process model
## Fitted to data: Larynx_ppp
## Fitting method: maximum likelihood (Berman-Turner approximation)
## Model was fitted using glm()
## Algorithm converged
## Call:
## ppm.ppp(Q = Larynx_ppp, trend = ~log(x) + log(y), interaction = Poisson())
## Edge correction: "border"
## [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
##
## Data pattern:
## Planar point pattern: 58 points
## Average intensity 2.56e-07 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
## (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
##
## Dummy quadrature points:
## 32 x 32 grid of dummy points, plus 4 corner points
## dummy spacing: 546.875 x 553.125 units
##
## Original dummy parameters: =
## Planar point pattern: 794 points
## Average intensity 3.5e-06 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
## (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
## Quadrature weights:
## (counting weights based on 32 x 32 array of rectangular tiles)
## All weights:
## range: [9700, 302000] total: 2.27e+08
## Weights on data points:
## range: [60500, 151000] total: 7960000
## Weights on dummy points:
## range: [9700, 302000] total: 2.19e+08
## --------------------------------------------------------------------------------
## FITTED :
##
## Nonstationary Poisson process
##
## ---- Intensity: ----
##
## Log intensity: ~log(x) + log(y)
##
## Fitted trend coefficients:
## (Intercept) log(x) log(y)
## 260.569833 -18.876038 -2.663316
##
## Estimate S.E. CI95.lo CI95.hi Ztest Zval
## (Intercept) 260.569833 226.54444 -183.44911 704.588772 1.1501930
## log(x) -18.876038 10.88638 -40.21294 2.460868 -1.7339137
## log(y) -2.663316 13.01807 -28.17826 22.851626 -0.2045862
##
## ----------- gory details -----
##
## Fitted regular parameters (theta):
## (Intercept) log(x) log(y)
## 260.569833 -18.876038 -2.663316
##
## Fitted exp(theta):
## (Intercept) log(x) log(y)
## 1.458951e+113 6.342211e-09 6.971664e-02
## (Intercept) log(x) log(y)
## 260.569833 -18.876038 -2.663316
## [1] 1879.827
\(\lambda(x, y)\) is modeled as \(\beta_0 + \beta_1 \cdot x + \beta_2 \cdot y\). This is an inhomogeneous Poisson process with a linear intensity in terms of \(x\) and \(y\).
AIC is 1879.827
No covariate shows a significant Ztest, meaning spatial intensity is NOT significantly related to both the X-direction and Y-direction.
## Point process model
## Fitted to data: Larynx_ppp
## Fitting method: maximum likelihood (Berman-Turner approximation)
## Model was fitted using glm()
## Algorithm converged
## Call:
## ppm.ppp(Q = Larynx_ppp, trend = ~incin_dist, interaction = Poisson())
## Edge correction: "border"
## [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
##
## Data pattern:
## Planar point pattern: 58 points
## Average intensity 2.56e-07 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
## (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
##
## Dummy quadrature points:
## 32 x 32 grid of dummy points, plus 4 corner points
## dummy spacing: 546.875 x 553.125 units
##
## Original dummy parameters: =
## Planar point pattern: 794 points
## Average intensity 3.5e-06 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
## (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
## Quadrature weights:
## (counting weights based on 32 x 32 array of rectangular tiles)
## All weights:
## range: [9700, 302000] total: 2.27e+08
## Weights on data points:
## range: [60500, 151000] total: 7960000
## Weights on dummy points:
## range: [9700, 302000] total: 2.19e+08
## --------------------------------------------------------------------------------
## FITTED :
##
## Nonstationary Poisson process
##
## ---- Intensity: ----
##
## Log intensity: ~incin_dist
## Model depends on external covariate 'incin_dist'
## Covariates provided:
## incin_dist: distfun
##
## Fitted trend coefficients:
## (Intercept) incin_dist
## 3.8906729971 -0.0000383648
##
## Estimate S.E. CI95.lo CI95.hi Ztest
## (Intercept) 3.8906729971 1.578276e+01 -2.704297e+01 3.482431e+01
## incin_dist -0.0000383648 3.177121e-05 -1.006352e-04 2.390562e-05
## Zval
## (Intercept) 0.2465141
## incin_dist -1.2075336
##
## ----------- gory details -----
##
## Fitted regular parameters (theta):
## (Intercept) incin_dist
## 3.8906729971 -0.0000383648
##
## Fitted exp(theta):
## (Intercept) incin_dist
## 48.9438145 0.9999616
## (Intercept) incin_dist
## 3.8906729971 -0.0000383648
## [1] 1879.358
Similarly, since incinerator is a single point, we need to create a square window from (xi - r, yi - r) to (xi + r, yi + r), where r is 20 meters. In this way, we can approximate a polygon.
Here, \(\lambda(x, y)\) is modeled as \(\exp(\beta_0 + \beta_1 \cdot \text{dist_to_incin}(x, y))\).
This is an inhomogeneous Poisson model.
The parameter estimate is -0.000038, but is NOT statistically significant. There is no strong evidence that intensity will increase or decrease with distance. In other words, it suggests the distance from the incinerator does not affect the intensity of larynx cancers.
## Stationary Cox point process model
## Fitted to point pattern dataset 'lung_ppp'
## Fitted by minimum contrast
## Summary statistic: K-function
## Minimum contrast fit (object of class "minconfit")
## Model: Log-Gaussian Cox process
## Covariance model: exponential
## Fitted by matching theoretical K function to lung_ppp
##
## Internal parameters fitted by minimum contrast ($par):
## sigma2 alpha
## 2.534082 802.515371
##
## Fitted covariance parameters:
## var scale
## 2.534082 802.515371
## Fitted mean of log of random intensity: -13.6209
##
## Converged successfully after 87 function evaluations
##
## Starting values of parameters:
## sigma2 alpha
## 1.0000 206.1528
## Domain of integration: [ 0 , 4375 ]
## Exponents: p= 2, q= 0.25
##
## ----------- TREND -----
## Point process model
## Fitted to data: X
## Fitting method: maximum likelihood (Berman-Turner approximation)
## Model was fitted using glm()
## Algorithm converged
## Call:
## ppm.ppp(Q = X, trend = trend, rename.intercept = FALSE, covariates = covariates,
## covfunargs = covfunargs, use.gam = use.gam, forcefit = TRUE,
## improve.type = ppm.improve.type, improve.args = ppm.improve.args,
## nd = nd, eps = eps)
## Edge correction: "border"
## [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
##
## Data pattern:
## Planar point pattern: 978 points
## Average intensity 4.31e-06 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
## (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
##
## Dummy quadrature points:
## 70 x 70 grid of dummy points, plus 4 corner points
## dummy spacing: 250.0000 x 252.8571 units
##
## Original dummy parameters: =
## Planar point pattern: 3659 points
## Average intensity 1.61e-05 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
## (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
## Quadrature weights:
## (counting weights based on 70 x 70 array of rectangular tiles)
## All weights:
## range: [4270, 63200] total: 2.27e+08
## Weights on data points:
## range: [4270, 31600] total: 17800000
## Weights on dummy points:
## range: [4270, 63200] total: 2.09e+08
## --------------------------------------------------------------------------------
## FITTED :
##
## Stationary Poisson process
##
## ---- Intensity: ----
##
##
## Uniform intensity:
## [1] 4.313092e-06
##
## Estimate S.E. CI95.lo CI95.hi Ztest Zval
## (Intercept) -12.35386 0.03197647 -12.41653 -12.29118 *** -386.342
##
## ----------- gory details -----
##
## Fitted regular parameters (theta):
## (Intercept)
## -12.35386
##
## Fitted exp(theta):
## (Intercept)
## 4.313092e-06
##
## ----------- COX -----------
## Model: log-Gaussian Cox process
##
## Covariance model: exponential
## Fitted covariance parameters:
## var scale
## 2.534082 802.515371
## Fitted mean of log of random intensity: -13.6209
##
## Final standard error and CI
## (allowing for correlation of Cox process):
## Estimate S.E. CI95.lo CI95.hi Ztest Zval
## (Intercept) -12.35386 0.237686 -12.81971 -11.888 *** -51.97552
## Generating 78 simulated realisations of fitted cluster model (39 to estimate
## the mean and 39 to calculate envelopes) ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
## 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
## 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
## 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,
## 78.
##
## Done.
From the plot of K-functions, we can see the observed K(r) (red line) is above the theoretical K(r) (green line), meaning there is clustering.
From the plot of G-functions, we can see observed G(r) is above the theoretical G(r), meaning the point pattern is clustering.
The result align with previous results
The Mean log‐intensity is −13.6209
A non‐zero variance(2.53) indicates that some locations have much higher log‐intensity than others, in other words, clusters exists.
## Stationary Cox point process model
## Fitted to point pattern dataset 'Larynx_ppp'
## Fitted by minimum contrast
## Summary statistic: K-function
## Minimum contrast fit (object of class "minconfit")
## Model: Log-Gaussian Cox process
## Covariance model: exponential
## Fitted by matching theoretical K function to Larynx_ppp
##
## Internal parameters fitted by minimum contrast ($par):
## sigma2 alpha
## 2.011052 693.817641
##
## Fitted covariance parameters:
## var scale
## 2.011052 693.817641
## Fitted mean of log of random intensity: -16.18522
##
## Converged successfully after 61 function evaluations
##
## Starting values of parameters:
## sigma2 alpha
## 1.000 1442.361
## Domain of integration: [ 0 , 4375 ]
## Exponents: p= 2, q= 0.25
##
## ----------- TREND -----
## Point process model
## Fitted to data: X
## Fitting method: maximum likelihood (Berman-Turner approximation)
## Model was fitted using glm()
## Algorithm converged
## Call:
## ppm.ppp(Q = X, trend = trend, rename.intercept = FALSE, covariates = covariates,
## covfunargs = covfunargs, use.gam = use.gam, forcefit = TRUE,
## improve.type = ppm.improve.type, improve.args = ppm.improve.args,
## nd = nd, eps = eps)
## Edge correction: "border"
## [border correction distance r = 0 ]
## --------------------------------------------------------------------------------
## Quadrature scheme (Berman-Turner) = data + dummy + weights
##
## Data pattern:
## Planar point pattern: 58 points
## Average intensity 2.56e-07 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
## (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
##
## Dummy quadrature points:
## 32 x 32 grid of dummy points, plus 4 corner points
## dummy spacing: 546.875 x 553.125 units
##
## Original dummy parameters: =
## Planar point pattern: 794 points
## Average intensity 3.5e-06 points per square unit
## Window: polygonal boundary
## single connected closed polygon with 11 vertices
## enclosing rectangle: [346600, 364100] x [412600, 430300] units
## (17500 x 17700 units)
## Window area = 226985000 square units
## Fraction of frame area: 0.733
## Quadrature weights:
## (counting weights based on 32 x 32 array of rectangular tiles)
## All weights:
## range: [9700, 302000] total: 2.27e+08
## Weights on data points:
## range: [60500, 151000] total: 7960000
## Weights on dummy points:
## range: [9700, 302000] total: 2.19e+08
## --------------------------------------------------------------------------------
## FITTED :
##
## Stationary Poisson process
##
## ---- Intensity: ----
##
##
## Uniform intensity:
## [1] 2.555892e-07
##
## Estimate S.E. CI95.lo CI95.hi Ztest Zval
## (Intercept) -15.17969 0.1313064 -15.43705 -14.92234 *** -115.6051
##
## ----------- gory details -----
##
## Fitted regular parameters (theta):
## (Intercept)
## -15.17969
##
## Fitted exp(theta):
## (Intercept)
## 2.555892e-07
##
## ----------- COX -----------
## Model: log-Gaussian Cox process
##
## Covariance model: exponential
## Fitted covariance parameters:
## var scale
## 2.011052 693.817641
## Fitted mean of log of random intensity: -16.18522
##
## Final standard error and CI
## (allowing for correlation of Cox process):
## Estimate S.E. CI95.lo CI95.hi Ztest Zval
## (Intercept) -15.17969 0.2310018 -15.63245 -14.72694 *** -65.71245
## Generating 78 simulated realisations of fitted cluster model (39 to estimate
## the mean and 39 to calculate envelopes) ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
## 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
## 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
## 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,
## 78.
##
## Done.
From the plot of K-functions, we can see the observed K(r) (red line) is above, but very close to theoretical K(r) (green line), meaning there is a little bit clustering. The point pattern is mainly random.
From the plot of G-functions, we can see observed G(r) is very close to theoretical G(r), meaning the point pattern is mainly random.
The result align with previous results
The Mean log‐intensity is −16.18
A non‐zero variance(2.011) indicates that some locations have much higher log‐intensity than others, in other words, clusters exists.
when minPts = 3 and eps = 2km
## DBSCAN clustering for 978 objects.
## Parameters: eps = 2000, minPts = 3
## Using euclidean distances and borderpoints = TRUE
## The clustering contains 1 cluster(s) and 2 noise points.
##
## 0 1
## 2 976
##
## Available fields: cluster, eps, minPts, metric, borderPoints
when minPts = 3 and eps = 0.5km
## DBSCAN clustering for 978 objects.
## Parameters: eps = 500, minPts = 3
## Using euclidean distances and borderpoints = TRUE
## The clustering contains 21 cluster(s) and 76 noise points.
##
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
## 76 222 258 163 8 22 28 48 7 5 17 23 3 7 26 18 24 8 6 3
## 20 21
## 3 3
##
## Available fields: cluster, eps, minPts, metric, borderPoints
| minPts = 3 and eps = 2km | minPts = 3 and eps = 0.5km | |
|---|---|---|
| Result | All points into one cluster except two points. The chosen parameters considered most points to be part of one big polygon. | There are more clusters, capturing more local clusters rather than forcing one giant cluster. |
| Number of clusters | 1 | 21 |
| Number of noise points | 2 | 76 |
The number of clusters can be very different when eps is very different. Tuning this parameter can leading to different results and interpretation, so that more patterns can be discovered and analyzed.
## HDBSCAN clustering for 978 objects.
## Parameters: minPts = 3
## The clustering contains 32 cluster(s) and 55 noise points.
##
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
## 55 9 12 3 23 174 9 7 6 9 3 3 8 5 3 6 33 15 19 42
## 20 21 22 23 24 25 26 27 28 29 30 31 32
## 35 3 14 13 3 27 4 18 5 3 26 5 378
##
## Available fields: cluster, minPts, coredist, cluster_scores,
## membership_prob, outlier_scores, hc
32 clusters and 55 noise points
Instead of a fixed eps, HDBSCAN can use multiple values of eps and builds a hierarchical tree. This can reveal clusters at different density levels.
From the hierarchical tree, we can see: at small eps levels, there are more clusters. As you increases eps, some clusters merge into larger clusters.
By HDBSCAN, different regions of the domain can end up split or merged at different values of eps. So the the final clusters can be seen as forming at locally appropriate density levels.